NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Can LLMs Implicitly Learn Numeric Parameter Constraints in Data Science APIs?

Deng, Yinlin; Xia, Chunqiu Steven; Cao, Zhezhen; Li, Meiziniu; Zhang, Lingming (December 2024, NeurIPS 2024 (Curran Associates))

Full Text Available
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM

Xia, Chunqiu Steven; Deng, Yinlin; Zhang, Lingming (August 2024, OpenReview)

Full Text Available
WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models

https://doi.org/10.1145/3689736

Yang, Chenyuan; Deng, Yinlin; Lu, Runyu; Yao, Jiayi; Liu, Jiawei; Jabbarvand, Reyhaneh; Zhang, Lingming (October 2024, Proceedings of the ACM on Programming Languages)

Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences over the software supply chain. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates test programs without sufficient understanding of internal compiler behaviors. As such, they often fail to construct test programs to exercise intricate optimizations. Meanwhile, traditional white-box techniques, such as symbolic execution, are computationally inapplicable to the giant codebase of compiler systems. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and even have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, guiding LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization, with a spotlight on detecting deep logic bugs in the emerging deep learning (DL) compilers. WhiteFox adopts a multi-agent framework: (i) an LLM-based analysis agent examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) an LLM-based generation agent produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are also used as feedback to further enhance the test generation prompt on the fly. Our evaluation on the three most popular DL compilers (i.e., PyTorch Inductor, TensorFlow-XLA, and TensorFlow Lite) shows that WhiteFox can generate high-quality test programs to exercise deep optimizations requiring intricate conditions, practicing up to 8 times more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 101 bugs for the compilers under test, with 92 confirmed as previously unknown and 70 already fixed. Notably, WhiteFox has been recently acknowledged by the PyTorch team, and is in the process of being incorporated into its development workflow. Finally, beyond DL compilers, WhiteFox can also be adapted for compilers in different domains, such as LLVM, where WhiteFox has already found multiple bugs.
more » « less
Full Text Available
Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning Libraries

https://doi.org/10.1145/3597503.3623343

Deng, Yinlin; Xia, Chunqiu Steven; Yang, Chenyuan; Zhang, Shizhuo Dylan; Yang, Shujing; Zhang, Lingming (February 2024, ACM)

Full Text Available
Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models

https://doi.org/10.1145/3597926.3598067

Deng, Yinlin; Xia, Chunqiu Steven; Peng, Haoran; Yang, Chenyuan; Zhang, Lingming (July 2023, ACM)

Full Text Available
Fuzzing Automatic Differentiation in Deep-Learning Libraries

Yang, Chenyuan; Deng, Yinlin; Yao, Jiayi; Tu, Yuxing; Li, Hanchi; Zhang, Lingming (July 2023, Proceedings of the IEEE/ACM International Conference on Software Engineering)

Full Text Available
Free lunch for testing: fuzzing deep-learning libraries from open source

https://doi.org/10.1145/3510003.3510041

Wei, Anjiang; Deng, Yinlin; Yang, Chenyuan; Zhang, Lingming (May 2022, International Conference on Software Engineering)

Full Text Available
Coverage-guided tensor compiler fuzzing with joint IR-pass mutation

https://doi.org/10.1145/3527317

Liu, Jiawei; Wei, Yuxiang; Yang, Sen; Deng, Yinlin; Zhang, Lingming (April 2022, Proceedings of the ACM on Programming Languages)

In the past decade, Deep Learning (DL) systems have been widely deployed in various application domains to facilitate our daily life, e.g., natural language processing, healthcare, activity recognition, and autonomous driving. Meanwhile, it is extremely challenging to ensure the correctness of DL systems (e.g., due to their intrinsic nondeterminism), and bugs in DL systems can cause serious consequences and may even threaten human lives. In the literature, researchers have explored various techniques to test, analyze, and verify DL models, since their quality directly affects the corresponding system behaviors. Recently, researchers have also proposed novel techniques for testing the underlying operator-level DL libraries (such as TensorFlow and PyTorch), which provide general binary implementations for each high-level DL operator and are the foundation for running DL models on different hardware platforms. However, there is still limited work targeting the reliability of the emerging tensor compilers (also known as DL compilers), which aim to automatically compile high-level tensor computation graphs directly into high-performance binaries for better efficiency, portability, and scalability than traditional operator-level libraries. Therefore, in this paper, we target the important problem of tensor compiler testing, and have proposed Tzer, a practical fuzzing technique for the widely used TVM tensor compiler. Tzer focuses on mutating the low-level Intermediate Representation (IR) for TVM due to the limited mutation space for the high-level IR. More specifically, Tzer leverages both general-purpose and tensor-compiler-specific mutators guided by coverage feedback for diverse and evolutionary IR mutation; furthermore, since tensor compilers provide various passes (i.e., transformations) for IR optimization, Tzer also performs pass mutation in tandem with IR mutation for more effective fuzzing. Our experimental results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing, with 75% higher coverage and 50% more valuable tests than the 2nd-best technique. Also, different components of Tzer have been validated via ablation study. To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed (PR merged).
more » « less
Full Text Available

Search for: All records